12 research outputs found
Representation Learning for Texts and Graphs: A Unified Perspective on Efficiency, Multimodality, and Adaptability
[...] This thesis is situated between natural language processing and graph representation learning and investigates selected connections. First, we introduce matrix embeddings as an efficient text representation sensitive to word order. [...] Experiments with ten linguistic probing tasks, 11 supervised, and five unsupervised downstream tasks reveal that vector and matrix embeddings have complementary strengths and that a jointly trained hybrid model outperforms both. Second, a popular pretrained language model, BERT, is distilled into matrix embeddings. [...] The results on the GLUE benchmark show that these models are competitive with other recent contextualized language models while being more efficient in time and space. Third, we compare three model types for text classification: bag-of-words, sequence-, and graph-based models. Experiments on five datasets show that, surprisingly, a wide multilayer perceptron on top of a bag-of-words representation is competitive with recent graph-based approaches, questioning the necessity of graphs synthesized from the text. [...] Fourth, we investigate the connection between text and graph data in document-based recommender systems for citations and subject labels. Experiments on six datasets show that the title as side information improves the performance of autoencoder models. [...] We find that the meaning of item co-occurrence is crucial for the choice of input modalities and an appropriate model. Fifth, we introduce a generic framework for lifelong learning on evolving graphs in which new nodes, edges, and classes appear over time. [...] The results show that by reusing previous parameters in incremental training, it is possible to employ smaller history sizes with only a slight decrease in accuracy compared to training with complete history. Moreover, weighting the binary cross-entropy loss function is crucial to mitigate the problem of class imbalance when detecting newly emerging classes. [...
What Makes a Language Easy to Deep-Learn?
Neural networks drive the success of natural language processing. A
fundamental property of language is its compositional structure, allowing
humans to produce forms for new meanings systematically. However, unlike
humans, neural networks notoriously struggle with systematic generalization,
and do not necessarily benefit from compositional structure in emergent
communication simulations. This poses a problem for using neural networks to
simulate human language learning and evolution, and suggests crucial
differences in the biases of the different learning systems. Here, we directly
test how neural networks compare to humans in learning and generalizing
different input languages that vary in their degree of structure. We evaluate
the memorization and generalization capabilities of a pre-trained language
model GPT-3.5 (analagous to an adult second language learner) and recurrent
neural networks trained from scratch (analaogous to a child first language
learner). Our results show striking similarities between deep neural networks
and adult human learners, with more structured linguistic input leading to more
systematic generalization and to better convergence between neural networks and
humans. These findings suggest that all the learning systems are sensitive to
the structure of languages in similar ways with compositionality being
advantageous for learning. Our findings draw a clear prediction regarding
children's learning biases, as well as highlight the challenges of automated
processing of languages spoken by small communities. Notably, the similarity
between humans and machines opens new avenues for research on language learning
and evolution.Comment: 32 pages, major update: improved text, added new analyses, added
supplementary materia
Open-World Lifelong Graph Learning
We study the problem of lifelong graph learning in an open-world scenario,
where a model needs to deal with new tasks and potentially unknown classes. We
utilize Out-of-Distribution (OOD) detection methods to recognize new classes
and adapt existing non-graph OOD detection methods to graph data. Crucially, we
suggest performing new class detection by combining OOD detection methods with
information aggregated from the graph neighborhood. Most OOD detection methods
avoid determining a crisp threshold for deciding whether a vertex is OOD. To
tackle this problem, we propose a Weakly-supervised Relevance Feedback
(Open-WRF) method, which decreases the sensitivity to thresholds in OOD
detection. We evaluate our approach on six benchmark datasets. Our results show
that the proposed neighborhood aggregation method for OOD scores outperforms
existing methods independent of the underlying graph neural network.
Furthermore, we demonstrate that our Open-WRF method is more robust to
threshold selection and analyze the influence of graph neighborhood on OOD
detection. The aggregation and threshold methods are compatible with arbitrary
graph neural networks and OOD detection methods, making our approach versatile
and applicable to many real-world applications
CBOW Is Not All You Need: Combining CBOW with the Compositional Matrix Space Model
Continuous Bag of Words (CBOW) is a powerful text embedding method. Due to its strong capabilities to encode word content, CBOW embeddings perform well on a wide range of downstream tasks while being efficient to compute. However, CBOW is not capable of capturing the word order. The reason is that the computation of CBOW's word embeddings is commutative, i.e., embeddings of XYZ and ZYX are the same. In order to address this shortcoming, we propose a learning algorithm for the Continuous Matrix Space Model, which we call Continual Multiplication of Words (CMOW). Our algorithm is an adaptation of word2vec, so that it can be trained on large quantities of unlabeled text. We empirically show that CMOW better captures linguistic properties, but it is inferior to CBOW in memorizing word content. Motivated by these findings, we propose a hybrid model that combines the strengths of CBOW and CMOW. Our results show that the hybrid CBOW-CMOW-model retains CBOW's strong ability to memorize word content while at the same time substantially improving its ability to encode other linguistic information by 8%. As a result, the hybrid also performs better on 8 out of 11 supervised downstream tasks with an average improvement of 1.2
Bag-of-Words vs. Sequence vs. Graph vs. Hierarchy for Single- and Multi-Label Text Classification
Graph neural networks have triggered a resurgence of graph-based text
classification methods, defining today's state of the art. We show that a
simple multi-layer perceptron (MLP) using a Bag of Words (BoW) outperforms the
recent graph-based models TextGCN and HeteGCN in an inductive text
classification setting and is comparable with HyperGAT in single-label
classification. We also run our own experiments on multi-label classification,
where the simple MLP outperforms the recent sequential-based gMLP and aMLP
models. Moreover, we fine-tune a sequence-based BERT and a lightweight
DistilBERT model, which both outperform all models on both single-label and
multi-label settings in most datasets. These results question the importance of
synthetic graphs used in modern text classifiers. In terms of parameters,
DistilBERT is still twice as large as our BoW-based wide MLP, while graph-based
models like TextGCN require setting up an graph, where
is the vocabulary plus corpus size.Comment: arXiv admin note: substantial text overlap with arXiv:2109.0377
Using Titles vs. Full-text as Source for Automated Semantic Document Annotation
We conduct the first systematic comparison of automated semantic annotation based on either the full-text or only on the title metadata of documents. Apart from the prominent text classification baselines kNN and SVM, we also compare recent techniques of Learning to Rank and neural networks and revisit the traditional methods logistic regression, Rocchio, and Naive Bayes. Across three of our four datasets, the performance of the classifications using only titles reaches over 90% of the quality compared to the performance when using the full-text
Using Adversarial Autoencoders for Multi-Modal Automatic Playlist Continuation
The task of automatic playlist continuation is generating a list of recommended tracks that can be added to an existing playlist. By suggesting appropriate tracks, i. e., songs to add to a playlist, a recommender system can increase the user engagement by making playlist creation easier, as well as extending listening beyond the end of current playlist. The ACM Recommender Systems Challenge 2018 focuses on such task. Spotify released a dataset of playlists, which includes a large number of playlists and associated track listings. Given a set of playlists from which a number of tracks have been withheld, the goal is predicting the missing tracks in those playlists. We participated in the challenge as the team Unconscious Bias and, in this paper, we present our approach. We extend adversarial autoencoders to the problem of automatic playlist continuation. We show how multiple input modalities, such as the playlist titles as well as track titles, artists and albums, can be incorporated in the playlist continuation task
Multi-modal adversarial autoencoders for recommendations of citations and subject labels
We present multi-modal adversarial autoencoders for recommendation and evaluate them on two different tasks: citation recommendation and subject label recommendation. We analyze the effects of adversarial regularization, sparsity, and different input modalities. By conducting 408 experiments, we show that adversarial regularization consistently improves the performance of autoencoders for recommendation. We demonstrate, however, that the two tasks differ in the semantics of item co-occurrence in the sense that item co-occurrence resembles relatedness in case of citations, yet implies diversity in case of subject labels. Our results reveal that supplying the partial item set as input is only helpful, when item co-occurrence resembles relatedness. When facing a new recommendation task it is therefore crucial to consider the semantics of item co-occurrence for the choice of an appropriate model
Lifelong learning on evolving graphs under the constraints of imbalanced classes and new classes
Lifelong graph learning deals with the problem of continually adapting graph neural network (GNN) models to changes in evolving graphs. We address two critical challenges of lifelong graph learning in this work: dealing with new classes and tackling imbalanced class distributions. The combination of these two challenges is particularly relevant since newly emerging classes typically resemble only a tiny fraction of the data, adding to the already skewed class distribution. We make several contributions: First, we show that the amount of unlabeled data does not influence the results, which is an essential prerequisite for lifelong learning on a sequence of tasks. Second, we experiment with different label rates and show that our methods can perform well with only a tiny fraction of annotated nodes. Third, we propose the gDOC method to detect new classes under the constraint of having an imbalanced class distribution. The critical ingredient is a weighted binary cross-entropy loss function to account for the class imbalance. Moreover, we demonstrate combinations of gDOC with various base GNN models such as GraphSAGE, Simplified Graph Convolution, and Graph Attention Networks. Lastly, our k-neighborhood time difference measure provably normalizes the temporal changes across different graph datasets. With extensive experimentation, we find that the proposed gDOC method is consistently better than a naive adaption of DOC to graphs. Specifically, in experiments using the smallest history size, the out-of-distribution detection score of gDOC is 0.09 compared to 0.01 for DOC. Furthermore, gDOC achieves an Open-F1 score, a combined measure of in-distribution classification and out-of-distribution detection, of 0.33 compared to 0.25 of DOC (32% increase)